Skip to content

[SPARK-4163][Core][WebUI] Send the fetch failure message back to Web UI #3032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

[SPARK-4163][Core][WebUI] Send the fetch failure message back to Web UI #3032

wants to merge 8 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Oct 31, 2014

This is a PR to send the fetch failure message back to Web UI.
Before:
f1
f2

After (Please ignore the meaning of exception, I threw it in the code directly because it's hard to simulate a fetch failure):
e1
e2

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22594 has started for PR 3032 at commit c261d23.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22600 has started for PR 3032 at commit a3bca65.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22594 has finished for PR 3032 at commit c261d23.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait FetchResult
    • case class SuccessFetchResult(blockId: BlockId, size: Long, buf: ManagedBuffer)
    • case class FailureFetchResult(blockId: BlockId, e: Throwable) extends FetchResult

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22594/
Test PASSed.

@zsxwing zsxwing changed the title [SPARK-4163][Core][Web UI] Send the fetch failure message back to Web UI [SPARK-4163][Core][WebUI] Send the fetch failure message back to Web UI Oct 31, 2014
@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22600 has finished for PR 3032 at commit a3bca65.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait FetchResult
    • case class SuccessFetchResult(blockId: BlockId, size: Long, buf: ManagedBuffer)
    • case class FailureFetchResult(blockId: BlockId, e: Throwable) extends FetchResult

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22600/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22681 has started for PR 3032 at commit 0c07d1f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22681 has finished for PR 3032 at commit 0c07d1f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait FetchResult
    • case class SuccessFetchResult(blockId: BlockId, size: Long, buf: ManagedBuffer)
    • case class FailureFetchResult(blockId: BlockId, e: Throwable) extends FetchResult

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22681/
Test FAILed.

extends Exception {

override def getMessage: String =
"Fetch failed: %s %d %d %d".format(bmAddress, shuffleId, mapId, reduceId)

def toTaskEndReason: TaskEndReason = FetchFailed(bmAddress, shuffleId, mapId, reduceId)
def toTaskEndReason: TaskEndReason =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may fit within 100ch on one line

@aarondav
Copy link
Contributor

aarondav commented Nov 1, 2014

LGTM, only had superficial comments.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 1, 2014

@aarondav thanks for reviewing it. Updated as per your comments.

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22696 has started for PR 3032 at commit 62103fd.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22696 has finished for PR 3032 at commit 62103fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • sealed trait FetchResult
    • sealed case class SuccessFetchResult(blockId: BlockId, size: Long, buf: ManagedBuffer)
    • sealed case class FailureFetchResult(blockId: BlockId, e: Throwable) extends FetchResult

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22696/
Test PASSed.

case class FetchResult(blockId: BlockId, size: Long, buf: ManagedBuffer) {
def failed: Boolean = size == -1
if (failed) assert(buf == null) else assert(buf != null)
sealed case class SuccessFetchResult(blockId: BlockId, size: Long, buf: ManagedBuffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, "sealed" is usually only applied to traits, as case classes themselves are not supposed to be extended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already replaced sealed with private[storage]. Thank you.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 2, 2014

How about now?

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22753 has started for PR 3032 at commit 316767d.

  • This patch merges cleanly.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22752/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22757 has started for PR 3032 at commit 4e946f7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22753 has finished for PR 3032 at commit 316767d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22753/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22757 timed out for PR 3032 at commit 4e946f7 after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22757/
Test FAILed.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22764 has started for PR 3032 at commit 4e946f7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22764 timed out for PR 3032 at commit 4e946f7 after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22764/
Test FAILed.

blockId match {
case ShuffleBlockId(shufId, mapId, _) =>
val address = statuses(mapId.toInt)._1
throw new FetchFailedException(address, shufId.toInt, mapId.toInt, reduceId)
throw new FetchFailedException(address, shufId.toInt, mapId.toInt, reduceId, e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this patch out locally, and it appears that, sadly, the cause is actually not propagated correctly back to the driver. I think we can revert this to the prior Utils.exceptionString() way, that looks good in the UI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert to the Utils.exceptionString() way.

Looks Utils.exceptionString() does not handle the nested exceptions like this. It only outputs the outermost exception info. Is it intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is almost certainly not intentional. I think the method was introduced rather recently, and does not have many users right now. It should be fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to fix it in another PR.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 3, 2014

New screenshots for current codes (I threw the exception in onBlockFetchSuccess directly to make up an error)

Stages page:
stags

Detail for stage:
stage

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22800 has started for PR 3032 at commit f7e1faf.

  • This patch merges cleanly.

@aarondav
Copy link
Contributor

aarondav commented Nov 3, 2014

LGTM, will merge as soon as Jenkins passes.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22800 has finished for PR 3032 at commit f7e1faf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22800/
Test PASSed.

@aarondav
Copy link
Contributor

aarondav commented Nov 3, 2014

Merging into master. Thanks for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants